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Two cases of primary plastid endosymbiosis are known. The first occurred ca. 1.6 billion years ago and 
putatively gave rise to the canonical plastid in algae and plants. The second is restricted to a genus of 
rhizarian amoebae that includes Paulinella chromatophora. Photosynthetic Paulinella species gained their 
plastid from an a-cyanobacterial source and are sister to plastid-lacking phagotrophs such as Paulinella 
ovalis that ingest cyanobacteria. To study the role of feeding behavior in plastid origin, we analyzed 
single-cell genome assemblies from six P. ovalis-like cells isolated from Chesapeake Bay, USA. Dozens of 
contigs in these cell assemblies were derived from prey DNA of a-cyanobacterial origin and associated 
cyanophages. We found two examples of horizontal gene transfer (HGT) in P. ova7is-like nuclear DNA from 
cyanobacterial sources. This work suggests the first evidence of a link between feeding behavior in 
wild-caught cells, HGT, and plastid primary endosymbiosis in the monophyletic Paulinella lineage. 

The plastid in algae and plants almost certainly originated in the founding group of photosynthetic eukar- 
yotes, the Plantae (or Archaeplastida 1 " 4 ) and subsequently spread to all other major algal groups (e.g., 
diatoms, dinoflagellates, euglenids) through secondary and tertiary endosymbiosis 5 ' 6 . Primary plastid 
acquisition occurred ca. 1.6 billion years ago 7 putatively through the phagotrophic engulfment and permanent 
retention of a cyanobacterial endosymbiont 1 . Plastid evolution resulted in the endosymbiotic gene transfer (EGT) 
of hundreds of genes from the captured endosymbiont to the nucleus of the Plantae ancestor 8,9 . 

The photosynthetic amoeba Paulinella chromatophora 10 contains blue-green "chromatophores" (i.e., plastids) 
and was first described by Robert Lauterborn 11 . This genus has become a model for endosymbiosis research 
because it is widely accepted as a second case of cyanobacterial primary endosymbiosis 12 " 16 . Recent work shows 
many examples of EGT to the amoeba nuclear genome from the ot-cyanobacterium-derived (e.g., Prochlorococcus 
and Synechococcus species) plastid 16 " 19 . To understand the processes that led to plastid origin in photosynthetic 
Paulinella we focused on its plastid-lacking sister taxa. Three heterotrophic Paulinella (P. ovalis, P. intermedia, 
and P. indentata) species are known 20 22 . P. ovalis feeds on cyanobacteria that have previously been identified in 
food vacuoles 20 . This suggests that the primary plastid in the monophyletic lineage of photosynthetic Paulinella 14 
is likely to be the outcome of permanent maintenance of captured cyanobacterial prey, as has been proposed for 
the origin of the Plantae plastid 1 ' 4 . Given conservation in prey choice and the widespread abundance of ot- 
Cyanobacteria in the oceans 23 ' 24 , it also is possible that members of this prokaryote clade may be detected in 
the food vacuoles of heterotrophic Paulinella species. Because P. ovalis, although seasonally abundant in nature 20 , 
has not yet been successfully cultivated, it was until now not possible to generate genome data from this lineage to 
test for the presence of prey DNA or prey-derived HGT. This fundamental problem was recently solved with the 
development of single-cell genomic methods that allow the generation of draft genome data from cells collected in 
the natural environment 25 " 28 . These data not only provide insights into the genomes of the targeted cell but also 
identify the sources of foreign DNA present at the time of cell capture (e.g., from prey, pathogens, or symbionts 28 ). 
Here we used single- cell genomics to generate draft assemblies from six P. ova//s-like cells isolated from 
Chesapeake Bay, USA. Specifically, we tested the idea that the source of the plastid in photosynthetic 
Paulinella reflects feeding behavior among its heterotrophic sister taxa. 
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Results 

A water sample collected on May 30, 2009 from the dock of the 
Smithsonian Environmental Research Center, Edgewater, MD, 
USA, was used as input for flow cytometry. Single heterotrophic cells 
<10 (im in size that lacked chlorophyll autofluorescence were 
sorted. After whole genome amplification (WGA) of total DNA, 
the taxonomic identity of the single-cell amplified genomes (SAGs) 
was defined through analysis of the 18S rDNA sequence 29 . This 
showed that 10/48 SAGs were closely related to photosynthetic 
Paulinella lineages (referred to here as P. ovalis-like; Figs. 1A, IB). 
Six of these SAGs that had identical small subunit rDNA sequences 
(P. ovalis-like cells 1-6 [Fig. IB]) were chosen for draft genome 
sequencing using the Roche 454 GS-FLX system. This resulted in 
180 - 308 Mbp of data from each of the cells that were used to 
generate individual genome assemblies (see Supplementary Table 
SI online). Each assembly comprised several thousand contigs with 
the total number of assembled bases ranging from ca. 3.5 - 7.2 Mbp 
with the exception of the data-poor P. ovalis-like cell 6 that had a 



relatively small assembly of size 1.5 Mbp. All six assemblies were 
used in BLASTx sequence similarity searches against a comprehens- 
ive local database (see Methods and Supplementary Table S2) to 
identify top hits (e-value < 10~ 5 ). The top hits were extracted and 
their numbers normalized (Supplementary Figs. SI, S2) to minimize 
the effect of uneven coverage bias introduced by multiple displace- 
ment amplification used in WGA 28 ' 30 ' 31 , resulting in the data shown in 
Figure 1C. 

An example of a cyanobacterium- derived DNA fragment in the P. 
ovalis-like cell 1 assembly of the 454 data (contig 03412, length =604 
nt, 1449 reads) is shown in Figure 2 A. This tree of a PstS phosphate 
ABC transporter shows that cell 1 contains DNA that is derived from 
a-Cyanobacteria (i.e., barring HGT of this gene into a non-cyano- 
bacterial cell). Note that a homolog of the gene is present in the 
plastid (chromatophore) genome of the photosynthetic P. chroma- 
tophora CCAC 0185 15 . Analysis of the proteobacterial DNA in cell 1 
showed that the majority of contigs had top hits to the marine bac- 
terial genus Pseudoalteromonas (i.e., Pseudoalteromonas sp. SM9913 
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Figure 1 | Evolutionary analyses of Paulinella ovalis-like SAGs. (a) Light microscopy image of the photosynthetic Paulinella chromatophora (left) and its 
phagotrophic sister P. oralis (right), (b) RAxML 48 tree (GTR + T + I model) inferred from 18S rDNA showing the phylogenetic position of P. ovalis-like 
cells within Rhizaria. Single-cell sorting identified several P. ovalis-like cells that comprise two distinct heterotrophic Paulinella clades (Clade 1 and Clade 
2) of which Clade 1 is most closely related to the photosynthetic P. chromatophora and Paulinella sp. FK01 14 , and is the subject of our study. RAxML and 
PhyML 49 bootstrap values are shown above and below the branches, respectively (only those > 60% are shown). The unit of branch length is the number 
of substitutions per site. The GenBank accession numbers (where available) are shown after each taxon name, (c) Taxonomic distribution of unique 
BLASTx hits (e- value < 10~ 10 ) using the contigs from the six P. ovalis-like single cell SAGs for which we have 454 data. The percentage distribution of each 
phylum across all six SAGs is shown. The arrows indicate markedly different phyletic origins of DNA among the SAGs. 
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Figure 2 | Bacterial DNA is present in the P. ovalis-like cell 1 SAG assembly, (a) Maximum likelihood (RAxML, WAG + T + F model) phylogeny of 
PstS phosphate ABC transporter proteins. Cyanobacteria are in blue text, other Bacteria are in black text, the chromatophore (plastid) and the sequence 
encoded on P. ovalis-like cell 1 contig 03412 are in magenta text, and cyanophage sequences are in dark green. The well- supported clade that includes a- 
Cyanobacteria is identified with the dashed gray line, (b) Taxonomic distribution of BLASTx hits to Proteobacteria in the 454 assembly of P. ovalis-like cell 
1. (c) Maximum likelihood (RAxML, WAG + T + F model) phylogeny of the transcription elongation factor NusA. P. ovalis-like cell 1 contig 00138 is 
shown in magenta text. RAxML and PhyML bootstrap values (100 replicates) in 2A and 2C are shown above and below the branches, respectively (only 
those ^ 50% are shown). The unit of branch length is the number of substitutions per site. The NCBI "gi" numbers are shown after each taxon name. 



[54 hits], P. tunicata D2 [30 hits], and P. haloplanktis [24 hits] 
Fig. 2B). One of the proteins encoded on contig 00138 (length=4847 
nt, 182 reads) that had a top hit to Pseudo alter omonas sp. SM9913 
was used to infer a phylogeny. This protein encodes the highly con- 
served transcription elongation factor NusA (e-value 6.60 X 10" 283 ) 
and demonstrates a strongly supported monophyletic group com- 
prised of the P. ova//s-like cell 1 NusA sequence with Pseudoaltero- 
monasi ' Alter omomonadales taxa (Fig. 2C). A second open reading 
frame on contig 00138 encodes the translation initiation factor IF-2 
that is also most closely related to Pseudoalteromonas species. De- 
spite this clear phylogenetic signal, given high rates of HGT among 
bacteria it is not assured that we have identified the true taxonomic 
source of the contig and whether single or multiple Proteobacteria 
are present in cell 1 DNA. 

To generate a more robust genome assembly from P. ova//s-like 
SAGs, we produced additional sequence data from cells 1 and 2 using 
an Illumina GAIIx instrument (see Methods). The Illumina data 
were co-assembled with the 454 reads and subjected to the 
BLASTx pipeline as described above. These results mirror the 454 
data, with cell 1 showing a significantly larger number of proteobac- 
terial hits (Fig. 3; Supplementary Fig. S2) than cell 2. To estimate the 
amount of coding DNA in the individual cell 1 and 2 combined (454 
+ Illumina) assemblies, we determined the number of nucleotides 
encoded on all contigs that had significant BLASTx hits. This showed 
that the cell 1 and cell 2 contigs contained 2.6 and 4.3 Mbp of eukar- 



yote DNA, 1.8 and 0.9 Mbp of bacterial DNA, and 0.2 and 0.9 Mbp 
of viral DNA, respectively (Supplementary Fig. S3). The annotations 
(when present) for the top hits in the cell 1 and cell 2 contigs are 
found in Supplementary Tables S3 and S4, respectively. 

Analysis of cyanobacterial and cyanophage gene fragments in the 
combined assembly. We searched for DNA fragments derived from 
cyanobacterial prey and associated phages in the combined 
assemblies. This BLASTx analysis turned up 35 and 62 hits for cell 

I, and 53 and 31 hits for cell 2 to Cyanobacteria and cyanophages, 
respectively (see Supplementary Tables S3, S4 and Figs. 4 A, 4B). A 
RAxML tree inferred from a protein (bacterial porin, OprB) encoded 
on one of the assembled fragments found in cell 1 is shown in 
Figure 4C and identifies prey DNA that is related to a-Cyano- 
bacteria. The cyanobacterial fragment (contig 7191) is of length 

II, 565 nt and has an average coverage of 7,577x. Prediction of 
open reading frames using MAKER 2 (http://derringer.genetics.utah. 
edu/cgi-bin/MWAS/maker.cgi) revealed 8 putative proteins (see 
Supplementary Fig. S4) that encode porin, an ABC transporter sub- 
unit, a putative histidine kinase, a hypothetical protein, a putative p- 
pantothenate cysteine ligase, a HNH endonuclease family protein, a 
ribonucleotide- diphosphate reductase subunit beta, and a putative 
nicotinamide nucleotide transhydrogenase, all with cyanobacterial 
top hits. The absence of introns and gene richness suggest a pro- 
karyotic origin of this contig. 



SCIENTIFIC REPORTS | 2 : 356 | DOI: 1 0.1 038/srep00356 



3 



Planctomycetes 

Chloraflexi 

Nucleariidae 

Archaea 

Opisthokonta 

Glaucophyta 

Actinobacteria 

Rhodophyta 

Cyanobacteria 

Firmicutes 
Chlamydia/ 
Verrumicfobia 
Bacteroidetes/ 
Chlorobi 

Picobiliphytes 

Rhizaria 

Vrra 

Choanoflagellida 
Excavate 
Haptophyta 
AJveolata 
Amoebozoa 
Fungi 
St ra men o piles 
Viridtplantae 
Metazoa 
Proteobacteria 



■ P. ovate-like 
cell 2 

P ovalis-Wke 
cell 1 



0 400 800 1200 
Number of BLASTx Hits 

Figure 3 | Taxonomic distribution of BLASTx hit numbers using the 
contigs from P. ovalis-like cells 1 and 2 for which we have 454 + Illumina 
data. 

The virus hits were studied to determine their putative taxonomic 
distribution. These data (Fig. 4B) show that the four most frequently 
recovered viral DNAs arise from cyanophages that infect 
Prochlorococcus and Synechococcus lineages. These phages are pre- 
sumably associated with the different a-Cyanobacteria 20 in the cells 
(Fig. 4A) or may be prey for P. ovalis-like cells. Cyanophage genomes 
encode genes of photosystems I and II that manipulate photosyn- 
thetic activity of the host to increase phage fitness 32 " 34 . Therefore we 
searched for contigs that encode these highly conserved genes in the 
assemblies. One of the contigs we found in cell 1 (contig 13737) is of 
length 16,010 nt and has an average coverage of 4,537x. Gene pre- 
diction using this contig (done as described above) identified 13 
proteins that all encode cyanophage gene products such as a class 
II aldolase/adducin family protein (top hit, Synechococcus phage S- 
SM2), a 6-phosphogluconate dehydrogenase (top hit, Synechococcus 
phage S-RSM4), a photosystem II Dl protein (PsbA; top hit, 
Synechococcus phage S-PM2), a ferredoxin (top hit, Prochloro- 
coccus phage Syn33), a virion structural protein (top hit, Syne- 
chococcus phage S-RSM4), a plastocyanin (top hit, Synechococcus 
phage S-SM2), and a photosystem II D2 protein (PsbD; top hit, 
Synechococcus phage S-PM2], among others (see Supplementary 
Fig. S5). The phylogeny of PsbD is shown in Figure 4D and demon- 
strates the close phylogenetic relationship between the protein 
encoded on contig 13737, cyanophage data available at NCBI 
(www.ncbi.nlm.nih.gov/), and the a-cyanobacterial sister clade that 
includes the plastid-encoded homologs in photosynthetic Paulinella 
species. These data provide a direct link between a phagotroph, its 
prey, phage that is associated with the prey (or is itself prey), and the 
source of the plastid in its sister group, the photosynthetic Paulinella 
clade 14 ' 17 " 19 . 

Have cyanobacterial genes been integrated into the nuclear 
genome of P. ovalis-like cells? We analyzed manually each 
BLASTx hit of the combined assembly data listed in Supplementary 



Tables S3 and S4 to search for contigs that encode a conserved 
cyanobacterial protein that contains non-matching insertions, pre- 
sumably resulting from nuclear introns. This search turned up one 
candidate in cell 1 (contig 4354, length=1227 nt, average cover- 
age =26x) that encodes a diaminopimelate (DAP) epimerase gene 
containing a large insertion in the predicted gene. To extend this 
contig, we used BLASTn to identify regions with partial overlap in 
the 454 contigs from all six P. ovalis-like cell assemblies. This analysis 
identified two contigs (cell 1 contig 02238, length=943 nt, 10 reads 
and cell 2 contig 00524, length=2600 nt, 217 reads) that could be co- 
assembled with contig 4354 to generate a high quality consensus 
fragment (ConsensusPlusl618) of length 5970 nt, that when used to 
map all sequence reads had >100x coverage over most of the region 
(see Fig. 5A). The short regions of zero coverage in Figure 5 A are due 
to repeated DNA that was masked by the assembler. Evidence that the 
Illumina paired- end reads span these repeat regions is shown in 
Supplementary Figure S6 demonstrating the contig is continuous. 
We also performed PCR using WGA- derived DNA from cell 1 and 
recovered fragments of expected size that span the length of contig 
ConsensusPlusl618, further validating the existence of this genomic 
region. 

Protein prediction of this contig using AUGUSTUS (http:// 
augustus.gobics.de/) and manual annotation identified three pro- 
tein-coding regions that contained multiple spliceosomal introns 
(Fig. 5 A, Supplementary Fig. S7). A dot plot analysis of the P. ova- 
lis-like DAP epimerase when compared to the plastid-encoded 
homolog from P. chromatophora CCAC 0185 confirmed the pres- 
ence of intervening sequences in the eukaryotic protein read-through 
product that correspond to the spliceosomal introns in this gene 
(Fig. 5B). Phylogenetic analysis of two of these proteins demon- 
strates that one (DAP epimerase [Fig. 5C]) originated via HGT from 
a a-cyanobacterial source, whereas the second (a protein kinase 
[Fig. 5D]) is of eukaryotic provenance. The third protein encoded 
on contig ConsensusPlusl618 is a putative universal stress protein 
that has a top BLASTp hit to a sequence from the human blood fluke 
Schistosoma japonicum (i.e., is eukaryotic in origin). To test the 
distribution of ConsensusPlusl618, we used the contig to map indi- 
vidual 454 reads from the six P. ova//s-like SAGs. This analysis 
showed that all cells had reads that mapped to this contig with data 
from some (e.g., cells 1, 2, and 4) nearly spanning the entire fragment 
(Supplementary Fig. S8). This suggests that the contig is likely to be 
present in all of the genomes. Our data therefore provide direct 
evidence for the integration of a-cyanobacterial DNA into the chro- 
mosome of P. ova//s-like cells. 

We identified a second putative cyanobacterium- derived gene in 
P. ova//s-like cells that contains a large insertion when compared to 
prokaryote homologs. The encoded protein (leucyl-tRNA synthe- 
tase) is found on cell 2 contig 11624 (see Supplementary Fig. S9; 
length = 7,564 nt, avg. coverage =299x) that also encodes a nuclear 
migration protein (nudC). Phylogenetic analysis demonstrates that 
P. ovalis-like leucyl-tRNA synthetase is sister to Cyanobacteria and 
monophyletic with oomycetes (Supplementary Fig. SI OA). This is a 
more ancient HGT event that may have been shared by the ancestor 
of Rhizaria and stramenopiles (e.g., oomycetes), followed by wide- 
spread loss in other members of these lineages. Alternatively and 
more likely, based on the restricted distribution, these are independ- 
ent HGTs from a cyanobacterial source. This contig in cell 2 has high 
sequence coverage (Supplementary Fig. SI 0B) and the neighboring 
gene that is a putative nudC homolog is clearly of eukaryotic prov- 
enance (Supplementary Fig. S10C). 

Discussion 

A key characteristic that has been postulated to underlie plastid 
endosymbiosis, and more generally genome evolution in eukaryotic 
microbes is long-term phagotrophy leading to HGT and ultimately 
plastid acquisition 1,4 ' 5,8,35 . However, as appealing as these ideas may 
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Figure 4 | Cyanobacterial and cyanophage DNA identified in the combined (454 + Illumina) assemblies of P. ova/is-like cells 1 and 2. (a) Taxonomic 
distribution of BLASTx hits to Cyanobacteria, using the cell 1 and 2 contigs. (b) Taxonomic distribution of BLASTx hits to virus sequences using the cell 1 
and 2 contigs. (c) Maximum likelihood (RAxML, WAG + Y + F model) phylogeny of bacterial porin (OprB) proteins, (d) Maximum likelihood 
(RAxML, WAG + Y + F model) phylogeny of photosystem II D2 (PsbD) proteins. In 4C and 4D, Cyanobacteria are in blue text, other Bacteria are in 
black text, the chromatophore (plastid) and the P. ova/is-like cell data are in magenta text, cyanophage sequences are in dark green, Viridiplantae is in light 
green text, red algae in red text, and chromalveolates in brown text. The well- supported clade that includes cyanophages is identified with the gray bar. 
RAxML and PhyML bootstrap values (100 replicates) are shown above and below the branches, respectively (only those > 50% are shown). The unit of 
branch length is the number of substitutions per site. The NCBI "gi" numbers are shown after each taxon name. 



be they cannot be tested directly with Plantae whose plastid origi- 
nated deep in the tree of Cyanobacteria 36 about 1.6 billion years ago 7 . 
The PaulineHa model therefore offers an opportunity to advance 
knowledge of plastid origin in a more recent, independent case of 
organelle origin in which the phagotrophic sister clade is available 
for study. Here we show that that P. ova//s-like SAG DNAs, although 
clearly of eukaryote provenance (i.e., containing identical rDNA 
sequences), harbor distinct pools of non-eukaryote sequence. 
These amoebae are heterotrophs based on the sorting procedure that 
excluded photosynthetic cells (see Methods) and the absence of 
plastid DNA in the assemblies. Therefore at the time of capture, 
the cells contained DNA from bacteria (and their associated phages) 



as prey 28 in their food vacuoles or they ingested phage as prey. This 
hypothesis is in line with the observation that P. ovalis feeds on 
cyanobacteria 20 and therefore likely ingests other bacteria and large 
phages as well. An alternative explanation is that the non-eukaryote 
hits derive from contamination associated with the cell surface and 
do not indicate intracellular DNA content. This interpretation is less 
favored for two reasons. First, the single cell approach has a low risk 
of DNA contamination from the sample matrix due to the small 
volume of fluorescence-activated cell sorting (FACS) microdroplets 
associated with each cell isolate; i.e., about 1-10 picoliter of the 
sample matrix 37 . Second, the different DNA compositions found 
in each SAG (in particular, from Proteobacteria, Bacteroidetes/ 
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Figure 5 | An example of a-cyanobacterial HGT found in the P. ovalis-like SAG data, (a) Intron distribution and coverage of P. ova/is-like genome 
contig ConsensusPlusl618 that encodes three proteins, (b) Dot plot analysis of DAP epimerase from P. ovalis-like cells and the homolog that is encoded in 
the plastid genome of P. chromatophora CCAC 0185 showing the intron positions in the P. ovalis- like sequence, (c) Maximum likelihood (RAxML, WAG 
+ r + F model) phylogeny of diaminopimelate epimerase (DapF) proteins, (d) Maximum likelihood (RAxML, WAG + T + F model) phylogeny of 
putative protein kinases. In 5B and 5C, Cyanobacteria are in blue text, other Bacteria are in black text, the chromatophore (plastid) and the P. ova/zs-like 
cell data are in magenta text, Viridiplantae is in green text, red algae in red text, and chromalveolates in brown text. RAxML and PhyML bootstrap values 
(100 replicates) are shown above and below the branches, respectively (only those > 50% are shown). The unit of branch length is the number of 
substitutions per site. The NCBI "gi" numbers (when available) are shown after each taxon name. 



Chlorobi, and viruses [Figs. 1C, 3, 4A, 4B]) is not consistent with the 
presence of a common cell surface contaminant shared by the cap- 
tured cells. Nevertheless, we cannot exclude the possibility that some 
non-eukaryote DNAs may have originated from cells/virus particles 
externally attached to the sorted cells. 

These results raise the possibility that given long-term phago tro- 
phy in heterotrophic Paulinella species, prey DNA might have been 
integrated into the host nuclear genome 35 . Phagotrophy is wide- 
spread in "chromalveolates" and excavates and is widely regarded 
as an explanation for the increased rate of HGT in these taxa 38 " 40 . 
The key difference between HGT as a general phenomenon among 
protists and our study is that feeding behavior in Paulinella is tied 
to a fundamental change in lineage evolution, plastid primary 



endosymbiosis. In addition, EGT and HGT is known to be a major 
component of plastid establishment 8,9,41,42 , but does HGT occur from 
cyanobacterial prey prior to plastid endosymbiosis and could it play 
a role in this process? Although we cannot yet answer the second 
question with our data, we provide two examples of cyanobacter- 
ium-derived HGT in the P. ova//s-like SAGs. The case of DAP 
epimerase (DapF) is of particular interest because this gene is 
derived from ot- Cyanobacteria. DapF carries out the second to last 
step in lysine biosynthesis in the DAP pathway. It is intriguing that 
plants that have a plastidial DAP pathway, encode a DapF gene of 
cyanobacterial origin, whereas all other genes in this pathway 
have proteobacterial or other affiliations 43 . The functional implica- 
tion of a cyanobacterium- derived DapF gene in plastid-lacking 
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P. ovalis-like cells is however unknown given incomplete knowledge 
of the DAP pathway in this lineage. Although the single cell genome 
approach does not provide expression data the presence of spliceo- 
somal introns and high sequence conservation suggest a functional 
DapF in P. ovalis-like cells. Finally, we presume that the two cya- 
nobacterium- derived genes we uncovered in the P. ovalis-like SAG 
genome data are not explained by a past photosynthetic history for 
these taxa. In the case that both the P. ovalis-like and photosynthetic 
Paulinella lineages once harbored a plastid, we would expect to find 
a more substantial imprint of EGT from alpha-cyanobacterial 
sources in the nuclear genome of the heterotrophic lineage 1619 . 

In summary, single- cell genome analysis provides several novel 
insights into phagotrophy and primary endosymbiosis in the 
Paulinella clade. Most important, we provide strong evidence that 
phagotrophic Paulinella feed on cyanobacterial prey derived from 
the same clade that gave rise to the plastid ca. 60 Mya 15 in their 
photosynthetic sister group. The high abundance of a-Cyano- 
bacteria in marine waters 44 likely explains this conservation in prey 
choice that spans millions of years. Similar to what was found in the 
single cell genome analysis of wild-caught picobiliphyte cells 28 , P. 
ovalis-like cells isolated from the natural environment show distinct 
pools of non-eukaryote DNA, presumably derived from prey, sym- 
bionts, or pathogens. The wide variety of non- cyanobacterial prokar- 
yote and viral DNA in the six cells also suggests that these (and likely 
most) phagotrophs have access to diverse prey DNAs that can be 
harnessed (e.g., via HGT 35 ) to support an incipient endosymbiosis or 
other host functions. More generally, these data highlight the import- 
ance of analyzing single cells in their natural environment to under- 
stand pro tist- environment interactions. 

Methods 

Sample preparation. A surface water sample was collected on May 30, 2009 from the 
dock of the Smithsonian Environmental Research Center, Edgewater, MD, USA. 
Samples were kept in the dark at in situ temperature until processing. Subsamples 
(3 mL) were incubated for 10 min with Lysotracker Green DND-26 (75 nmol.L" 1 ; 
Invitrogen), a pH-sensitive green fluorescing probe that stains food vacuoles in 
protists 45 . Target cells were identified and sorted using a MoFlo™ (Beckman- Coulter) 
flow cytometer equipped with a 488 nm laser for excitation. Prior to sorting, the 
cytometer was cleaned thoroughly with bleach. All tubes, plates, and buffers were UV- 
treated prior to use to remove any DNA contamination. A 1% NaCl solution (0.2 um 
filtered and UV treated) was used as sheath fluid. The cleaning and preparation 
techniques were as previously described 27,29 . 

Heterotrophic protists were identified by the presence of Lysotracker fluorescence 
and the absence of chlorophyll fluorescence (Fig. 6). Forward scatter was also used to 
select only the smaller protists that were ca. <10 um in diameter. The sort criteria 
were optimized for a Lysotracker region that contained 5-10% heterotrophic 
Paulinella by positive microscopic identification, prior to single cell sorting. 
Individual target cells were deposited into 96 well plates, where some wells were 
dedicated for positive controls (10 cells/well) and negative controls (0 cells/well). All 
wells on the microplates contained 5 uL 1 x PBS or Lyse-N-Go (Pierce). The sorted 
microplates were centrifuged briefly and stored at — 80°C. 

Whole genome amplification. Cells deposited in PBS were lysed with cold KOH 27 . 
Cells deposited into Lyse-N-Go were lysed using a thermal cycle protocol provided by 
the manufacturer. Cell lysate genomic DNA was amplified using multiple 
displacement amplification (MDA 46,47 ). All MDA reactions contained 2 U/uL 
Repliphi polymerase, 1 x reaction buffer, 0.4 mM dNTPs, 2 mM DTT (Epicentre), 
1 uM SYTO-9 (Molecular Probes) and 50 nM random hexamer primers (IDT). 
Samples were incubated at 30°C for 6 h using a real-time thermal cycler with 
fluorescence measured at 6 min intervals. The Repliphi polymerase was inactivated 
by incubation for 3 min at 65°C, and the amplified DNA was stored at — 80°C until 
further processing. After whole genome amplification, the SAGs were screened by 
PCR using conserved 18S rDNA primers to determine the phylogenetic origin of the 
nucleic acids. The genomic DNA of the six selected SAGs was re-amplified using the 
Repli-G midi kit (Qiagen) using the manufacturer's instructions. The products of the 
second MDA reaction were de-branched with SI nuclease to reduce chimeric 
sequences during MDA 25 and purified with a PCR purification kit (Qiagen). 

Genome sequencing and assembly. About 5 |ig of genomic DNA derived from each 
P. ova//s-like SAG with the A260/280 ratio of 1.85 was used for shotgun sequencing 
with the GS-FLX Titanium platform (Roche) at the DNA Facility at the University of 
Iowa (http://dna-9.int-med.uiowa.edu/). One-half of a picotitre plate was used to 
generate sequence data from each sample, resulting in over 600,000 reads per sample 
(Supplementary Table SI). All assemblies were generated with the native Roche 




Log Lysotracker Fluorescence (green) 

Figure 6 | Flow cytometric dot plot of the Lysotracker stained field 
sample. The heterotrophic protist sort region (shaded green) was 
identified as containing high relative green fluorescence (Lysotracker- 
stained food vacuoles) and low relative red fluorescence (indicative of 
chlorophyll). Phototrophs (shaded red) have both high chlorophyll 
fluorescence and Lysotracker fluorescence. A light microscopic image of a 
P. ovalis- like cell is shown in the inset (the scale bar indicates 5 um). 

Newbler Assembler, versions 2.3 and 2.5.3. The read depth/contig for the individual 
assemblies was determined by parsing the 454AlignmentInfo.tsv file, which is one of 
the output files generated by the Newbler assembler. The read depth is defined as the 
number of bases from all the reads used to assemble the contigs/Contig consensus 
length. All six assemblies were blasted (BLASTx) against RefSeq release 45 (http:// 
www.ncbi.nlm.nih.gov/RefSeq/) and other publicly available sources (Supplementary 
Table S2). The top hits were extracted (leaving only 1 hit per contig). These were 
organized according to their phyletic grouping. This grouping was normalized such 
that, all P. ova//s-like SAG contigs with hits to the same target (overlapping and non- 
overlapping) were counted as one. 

About 10 jag of WGA-derived DNA from P. ova//s-like cells 1 and 2 were each used 
to construct a library (i.e., sheared DNA fragments of size 500 bp) for 150 bp x 
150 bp paired- end sequencing using an Illumina GAIIx instrument. Standard 
Illumina protocols (http://www.illumina.com/) were used to generate the library. For 
P. ovalis-like cell 1, a total of 46 million reads resulted in 4.7 Gbp of data that were 
assembled into 14,091 contigs with a N50= 1.2 Kbp, totaling 11.1 Mbp. For P. ovalis- 
like cell 2, a total of 37 million reads resulted in 3.8 Gbp of data that were assembled 
into 17,793 contigs with a N50=994 bp, totaling 12.3 Mbp. The 454 + Illumina 
combined assemblies were done using the default settings and the CLC Genomics 
Workbench tools (http://www.clcbio.com/). 
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